Skip to content

ASTC decoding part 1: Decoder base and low dynamic range support#46

Draft
Erik-White wants to merge 8 commits into
SixLabors:mainfrom
Erik-White:astc-decoding-ldr
Draft

ASTC decoding part 1: Decoder base and low dynamic range support#46
Erik-White wants to merge 8 commits into
SixLabors:mainfrom
Erik-White:astc-decoding-ldr

Conversation

@Erik-White
Copy link
Copy Markdown
Contributor

@Erik-White Erik-White commented May 17, 2026

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

The first part of ASTC (Adaptive scalable texture compression) decoding. ASTC is used with KTX containers to compress texture data.
This adds the main ASTC decoder with support for low dynamic range. Implementing an ASTC decoder was a major challenge, but I was able to use the ARM astc-encoder in C++ as a reference implementation.

ASTC is very complex, with different combinations of modes, block sizes and partitions, so I'm afraid that the diff is very large.

Prerequisites

To follow

  • HDR support
  • Round trip testing vs ARM implementation
  • Benchmarks
  • Fuzz testing

Limitations

  • No support for 3D ASTC block types (4x4x4 etc)
  • No encoding

Intentionally omitted

  • Swizzle remapping (also not useful for ImageSharp.Textures, can be handled downstream)
  • sRGB containers decode to raw UNORM8 (conversion is handled downstream)

Test data

Everything used is either created by myself, or sourced from
https://github.com/KhronosGroup/KTX-Software (Apache 2.0 license)
https://github.com/KhronosGroup/KTX-Software-CTS (Apache 2.0 license)
https://github.com/ARM-software/astc-encoder (Apache-2.0 license)

Performance

This implementation compares very favourably to the ARM C++ implementation. Single shot benchmarks show 3-4 faster on this small test set:

| Method         | Categories | Mean      | Error     | StdDev    | Ratio | RatioSD | Allocated | Alloc Ratio |
|--------------- |----------- |----------:|----------:|----------:|------:|--------:|----------:|------------:|
| Reference_Ldr  | LDR        | 10.584 ms | 0.1662 ms | 0.1473 ms |  1.00 |    0.02 |     616 B |        1.00 |
| ImageSharp_Ldr | LDR        |  2.756 ms | 0.0524 ms | 0.0539 ms |  0.26 |    0.01 |     384 B |        0.62 |

...however this includes the allocation overhead of the ARM decoder. Re-using the same decoder context in the ARM decoder shows a probably more realistic comparison:

| Method         | Categories | Mean       | Error    | StdDev   | Ratio | RatioSD | Allocated | Alloc Ratio |
|--------------- |----------- |-----------:|---------:|---------:|------:|--------:|----------:|------------:|
| Reference_Ldr  | LDR        | 2,390.6 us | 16.59 us | 13.86 us |  1.00 |    0.01 |         - |          NA |
| ImageSharp_Ldr | LDR        | 2,726.7 us | 36.61 us | 34.25 us |  1.14 |    0.02 |     384 B |          NA |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant